Goto

Collaborating Authors

 Ruse


Constructing Cross-lingual Consumer Health Vocabulary with Word-Embedding from Comparable User Generated Content

Chang, Chia-Hsuan, Wang, Lei, Yang, Christopher C.

arXiv.org Artificial Intelligence

The online health community (OHC) is the primary channel for laypeople to share health information. To analyze the health consumer-generated content (HCGC) from the OHCs, identifying the colloquial medical expressions used by laypeople is a critical challenge. The open-access and collaborative consumer health vocabulary (OAC CHV) is the controlled vocabulary for addressing such a challenge. Nevertheless, OAC CHV is only available in English, limiting its applicability to other languages. This research proposes a cross-lingual automatic term recognition framework for extending the English CHV into a cross-lingual one. Our framework requires an English HCGC corpus and a non-English (i.e., Chinese in this study) HCGC corpus as inputs. Two monolingual word vector spaces are determined using the skip-gram algorithm so that each space encodes common word associations from laypeople within a language. Based on the isometry assumption, the framework aligns two monolingual spaces into a bilingual word vector space, where we employ cosine similarity as a metric for identifying semantically similar words across languages. The experimental results demonstrate that our framework outperforms the other two large language models in identifying CHV across languages. Our framework only requires raw HCGC corpora and a limited size of medical translations, reducing human efforts in compiling cross-lingual CHV.


Polar Ducks and Where to Find Them: Enhancing Entity Linking with Duck Typing and Polar Box Embeddings

Atzeni, Mattia, Plekhanov, Mikhail, Dreyer, Frédéric A., Kassner, Nora, Merello, Simone, Martin, Louis, Cancedda, Nicola

arXiv.org Artificial Intelligence

Entity linking methods based on dense retrieval are an efficient and widely used solution in large-scale applications, but they fall short of the performance of generative models, as they are sensitive to the structure of the embedding space. In order to address this issue, this paper introduces DUCK, an approach to infusing structural information in the space of entity representations, using prior knowledge of entity types. Inspired by duck typing in programming languages, we propose to define the type of an entity based on the relations that it has with other entities in a knowledge graph. Then, porting the concept of box embeddings to spherical polar coordinates, we propose to represent relations as boxes on the hypersphere. We optimize the model to cluster entities of similar type by placing them inside the boxes corresponding to their relations. Our experiments show that our method sets new state-of-the-art results on standard entity-disambiguation benchmarks, it improves the performance of the model by up to 7.9 F1 points, outperforms other type-aware approaches, and matches the results of generative models with 18 times more parameters.